Sense and Reference Disambiguation in Wikipedia
نویسندگان
چکیده
Wikipedia articles are annotated by volunteer contributors with numerous links that connect words and phrases to relevant titles in Wikipedia. In this paper, we identify inconsistencies in the user annotation of links and show that they can have a substantial impact on the performance of word sense disambiguation systems that are trained on Wikipedia links. We describe two major types of link annotations – sense and reference – that are frequently used without being explicitly distinguished in Wikipedia, and present an approach to training sense and reference disambiguation systems in the presence of such annotation inconsistencies. Experimental results demonstrate that accounting for annotation ambiguity in Wikipedia links leads to significant improvements in disambiguation accuracy.
منابع مشابه
Word Sense Disambiguation Using Wikipedia
This paper describes explorations in word sense disambiguation using Wikipedia as a source of sense annotations. Through experiments on four different languages, we show that the Wikipedia-based sense annotations are reliable and can be used to construct accurate sense classifiers.
متن کاملUsing Wikipedia for Automatic Word Sense Disambiguation
This paper describes a method for generating sense-tagged data using Wikipedia as a source of sense annotations. Through word sense disambiguation experiments, we show that the Wikipedia-based sense annotations are reliable and can be used to construct accurate sense classifiers.
متن کاملImproving Wikipedia Miner Word Sense Disambiguation Algorithm
This document describes the improvements of the Wikipedia Miner word sense disambiguation algorithm. The original algorithm performs very well in detecting key terms in documents and disambiguating them against Wikipedia articles. By replacing the original Normalized Google Distance inspired measure with Jaccard coefficient inspired measure and taking into account additional features, the disam...
متن کاملAutomatic Identification and Disambiguation of Concepts and Named Entities in the Multilingual Wikipedia
In this paper we present an automatic multilingual annotation of the Wikipedia dumps in two languages, with both word senses (i.e. concepts) and named entities. We use Babelfy 1.0, a state-of-the-art multilingual Word Sense Disambiguation and Entity Linking system. As its reference inventory, Babelfy draws upon BabelNet 3.0, a very large multilingual encyclopedic dictionary and semantic network...
متن کاملWikipedia Mining for Triple Extraction Enhanced by Co-reference Resolution
Since Wikipedia has become a huge scale database storing wide-range of human knowledge, it is a promising corpus for knowledge extraction. A considerable number of researches on Wikipedia mining have been conducted and the fact that Wikipedia is an invaluable corpus has been confirmed. Wikipedia’s impressive characteristics are not limited to the scale, but also include the dense link structure...
متن کامل